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We study the evolutionary dynamics of a maladapted population of self-replicating sequences 
on strongly correlated fitness landscapes. Each sequence is assumed to be composed of blocks of 
equal length and its fitness is given by a linear combination of four independent block fitnesses. A 
mutation affects the fitness contribution of a single block leaving the other blocks unchanged and 
hence inducing correlations between the parent and mutant fitness. On such strongly correlated 
fitness landscapes, we calculate the dynamical properties like the number of jumps in the most 
populated sequence and the temporal distribution of the last jump which is shown to exhibit a 
inverse square dependence as in evolution on uncorrelated fitness landscapes. We also obtain exact 
results for the distribution of records and extremes for correlated random variables. 
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I. INTRODUCTION 



Fitness is a measure of an organism's ability to survive and reproduce [l|, 0] . Fit organisms produce more offspring 
and can dominate the population while the less fit ones can be lost. Mathematically, fitness is a non-negative real 
number associated with a sequence which is a string of L letters whose meaning is context dependent. For example, 
fitness represents the stability of a sequence of amino acids in case of proteins, activity for an enzyme or replication 
'— ' rate for a genetic sequence of nucleotides. On plotting the fitness as a function of the sequence, a fitness landscape 
is obtained. Empirical measurement of fitness landscapes is very hard since the number of sequences increases 
exponentially with the sequence length L. However several qualitative features particularly the topography of the 
■ fitness landscapes has been deduced in experiments on proteins and microbes either by an explicit construction of the 
fitness landscapes for small 5) or indirect measurements of relevant quantities. These_experiments show that the 
fitness landscapes can have smooth hills as evidenced by fast adaptation in some proteins 3] or multiple peaks as seen 
in microbial populations that evolve towards different fitness maxima [4-6] and enzymes with short uphill paths to 
. the global fitness peak Q . Detailed studies in which all or a set of mutants from wild type to an optimum are created 
' and their fitness measured [1| have also indicated the smooth [9| and rugged [13, [Hi nature of the fitness landscapes. 
, The topography of the fitness landscapes is related to the correlations between the fitness of the sequences. If the 
fitness of the mutants of a sequence is correlated to that of the sequence so that the mutant fitness does not differ 
^ appreciably from the parent sequence, a smooth fitness landscape is generated whereas if the mutant fitnesses are 
independent random variables so that the fitness of one sequence has no influence over the fitness of other sequences 
differing from it by even a single mutation, a highly rugged fitness landscape with multiple optima is obtained. Several 



. theoretical models such as NK model JJ], block model 7] and rough Mt. Fuji-type model [13| in which correlations 
■ " ' can be tuned via a parameter have been proposed. Although realistic fitness landscapes exhibit intermediate degree 
of correlations [M], much of the theoretical work has focused on the limiting cases of fitness landscapes with high 
degree of correlation but single fitness peak [Tsj and no correlations but a large number of local optima (l&nlS] . 

In this article, we study the evolutionary dynamics on the fitness landscapes generated by the block model [3| in 
which a sequence of length L is assumed to be composed of independent units or blocks of length As explained later 
(see Sec.[n|), the correlations can be varied by changing the block length i from maximally correlated case with I —1 
to maximally uncorrelated one with £ = L. Here we focus on the block model with £ — 2 which generates fitnesses 
that are strongly correlated but to a lesser degree than the maximally correlated case and the fitness landscape is 
moderately rugged i.e. exhibits several peaks. 

The evolution model that we work with here describes the deterministic evolution of an infinitely large population 
of asexually replicating sequences. In this model, the population is initially distributed in such a manner that the 
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high fitness sequences have lower initial population and vice versa but the population of all the sequences increases 
linearly with time [l6| . As time goes on, a highly fit subpopulation is able to overcome the poor initial condition and 
dominate the population until an even fitter population overtakes it. This process goes on until the globally fittest 
sequence becomes the most populated one. The stepwise dynamics of such leadership changes termed jumps have 
been studied when the fitness variables are completely uncorrelated (l6l - [l8f : here we are interested in this problem 
when the fitnesses are strongly correlated. As explained in the next section, in the context of this problem, it is also 
relevant to consider the sequence with largest fitness amongst sequences carrying D mutations relative to a reference 
sequence and whose fitness is a record in that its fitness exceeds the fitness of all the sequences with less than or equal 
to D mutations. Thus we are led to study the statistics of maximum [13] and records j2lj when the random variables 
are not independent, both of which have been much less studied unlike the problem when the random variables are 
independently distributed. 

Our detailed analysis presented in Sec. IIVI shows that the statistical properties studied depend only on whether 
the number of mutations D are odd or even and whether D lies below or above L/2. This simplification allows 
us to tackle the problem analytically and to find exact expressions for various quantities. On uncorrelated fitness 
landscapes, it has been shown that the ave rag e number of leadership changes increases as \/L and the timing of the 
last jump exhibits a dependence 13. For evolution on the class of strongly correlated fitness landscapes 
studied here, we find that the average number of jumps is a constant independent of L but the time dependence of 
the distribution of the last jump remains unaffected. The average number of records is found to increase linearly with 
L as in maximally rugged case albeit with a larger prefactor. 



II. SHELL MODEL ON CORRELATED FITNESS LANDSCAPES 



Consider a microbial population evolving in a complex environment that can be modeled by rugged fitness land- 
scapes. At large times, most of the population resides at the globally fittest sequence of the fitness landscape and due 
to mutations, a suite of mutants is also present. If the population size is infinite, a nonzero population is present at all 
the sequences whereas a finite population produces only a small number of mutants around the fittest sequence jl9i] . 
Now if the environment is changed by changing (say) the nutrient medium of the microbial population, the fittest 
subpopulation before the environment change will be typically maladapted to the new environment and depending 
on the total population size, a small population may be present at the new fittest sequence. We are interested in 
finding how the new global maximum is reached starting with an initial condition in which all the population is at 
the sequence that was globally fittest before the environmental change. The exact evolutionary dynamics of average 
Hamming distance and overlap function has been studied on permutationally invariant [22] and uncorrelated [2J| 
fitness landscapes. Here we will be tracking the evolution of the most populated sequence in time on strongly corre- 
lated fitness landscapes. The dynamics of the adaptation process is studied in the setting when the population size is 
infinite so that the fiuctuations in the population frequency of a sequence can be neglected and one can work with the 
averages. In the following, we begin with the quasispecies model of biological evolution [23, [2^ and proceed to relate 
it to the shell model J^]. We then define and explain some properties of the block model [3] of correlated fitness 
landscapes that we shall use in the paper. 

We consider an infinitely large population of binary sequences where a sequence cr = {(Ti,...,(Tl} , Ui = 0,1 is a 
string of L letters. The population evolves by the elementary processes of replication and mutation. If the fitness A{a) 
of the sequence cr is defined as the average number of copies produced per generation and Pa-i-a' is the probability that 
a sequence a' mutates to the sequence cr at a Hamming distance D{a,a') = 'Y^^^ii^i — o'^j^ from it, the population 
fraction A'(cr, of sequence a at time t evolves according to the following quasispecies equation [l7l[23|: 

where the denominator on the right hand side ensures the normalisation condition X]dr^('^'^) = 1 is satisfied at 
all times. Assuming that the mutations occur independently at each locus with a probability /x, the mutational 
probability Pa^a' — /i^^"'"'^ ^(1 — ^)^-^('^'<^ ). In the following discussion, we will use the unnormalised population 
Z(cr, t) defined through the relation X(tT, t) = Z{a, t)/J2a' ^('^'^ ^s it obeys a linear equation given by 

Z{cr, t + l)=J2 /^^^"'"'^ (1 - ^J-)^'°''''''''^A{a')Z{a', t) (2) 



As discussed at the beginning of this section, we consider the evolution of the dominant population starting with the 
initial condition X{a,0) ~ Z{a,0) = 6^ ,^(0) where cr^"^ is the fittest sequence before the change in the environment. 
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Earlier work |16l - ll8j has shown that the statistical properties of the most populated sequence in the quasispecies 
model are accurately described by a simplified shell model which approximates the solution of ^ by 

Z((7,t) (3) 

The above equation can be heuristically obtained as follows: on iterating ^ with the given initial condition, the 
population Z{a,l) ^ /Lf°(°'''^'''')A(fT^''-') for small /i. Thus all the mutants become available in one generation for an 
infinitely large population even after starting with a highly localised population. If the mutations are neglected for 
further evolution i.e. Z{a,t + 1) = A{(T)Z{a,t), the solution ([3]) is immediately obtained. A detailed analysis has 
shown that the behavior of Z{a,t) in shell model matches the quasispecies dynamics ([2]) only for highly fit sequences 
and at short times. However it captures the behavior of the most populated genotype exactly at all times [18| and 
therefore we will work with the shell model in the rest of the article. 

Taking the logarithm of both sides of (jS]) and rescaling the time by |ln/^|, the logarithmic population E{a,t) ^ 
hi Z[a,t) is seen to increase linearly in time with a slope \nA{a) — it;(cr), 

E{a, t) = -D{a, a'-°^ ) + w{a)t (4) 

According to the above equation, there are (^) populations in a shell of radius D from the initial sequence which have 
the same initial condition but different growth rates. As the fittest population in each shell grows the fastest, one can 
work with the largest fitness u''^™^'^-' (D) in each shell. Labeling the fittest sequence in a shell by its shell number, (|4]) 
can be rewritten as 

E{D,t) = -D + w'^'^''''\D)t (5) 

Thus we arrive at a model in which the fitness variables w^"^^^\D) are independent but non-identically distributed. 
We mention in passing that the above linear dynamics when the slope variables are independent and identically 
distributed (i. i.d.) have appeared in a shell model with one-dimensional fitness 16], a gas of elastically colliding hard 
core particles [l^ and a spin glass model with random entropy [13] . 

As mentioned earlier, we are mainly interested in the dynamics of the most populated sequence whose fitness 
changes abruptly or jumps in time. Due to ([5]), the leader in shell D' is overtaken by a fitter population in shell 
D> D' at time T{D, D') given by 

T{D, D ) - _ y;(max)(£)/) 

Initially the sequence ct*^"^ is the leader. As the overtaking time must be positive, the population in shell D = 1 
can be a leader provided w'^'°^^^\l) > w'^"^'^^^ (0) . Similarly, the fittest sequence in shell 2 can be the most populated 
sequence if w^"^'^^'>{2) = maxlw'^'^'^^^O) , 'w^'°^^^\l) , w'-'^'^^\2)} . In general, a population at Hamming distance D > 
has a chance of becoming a leader only if its fitness is greater than that of all the other populations at Hamming 
distance D' < D or in other words, the fitness in shell D is a record. As noted in earlier works, it is not sufficient 
to be a record fitness in order that the corresponding sequence can be the dominant sequence [l6l . and a jump 
occurs only when the current leader is overtaken in minimum time. Due to this constraint, not all record sequences 
participate in the jump process and thus the number of records is an upper bound on the number of jumps. 

We next define the block model introduced by Perelson and Macken who were motivated by the observation that 
many biomolecules such as proteins and antibodies are composed of domains or partitions [7|. As shown in Fig. [TJ 
a sequence of length L is divided into B independent blocks of equal length £ = L/B , 1 < i < L. Each block 
configuration is assigned a fitness value which may also depend on the position of the block (locus-dependent block 
fitness model). In this article, we assume that a block configuration at any location in the sequence carries the same 
block fitness (see Sec. |V]also). These 2^ block fitnesses are chosen independently from a common distribution with 
support on the interval [/, it] where I and u are respectively the lower and upper limits of the block fitness distribution. 
The sequence fitness is given by the average of the corresponding block fitnesses. 

The topographical features such as the number of local maxima depends on £. For a sequence to be a local maximum, 
each of its B blocks must also be a local maximum. Since a sequence is composed of independent blocks and the 
average number of local optima of a sequence of length £ with i.i.d. fitness is 2^/{£ + 1), it follows that the average 
number riopt of local maxima of a sequence of length L and block length £ is given by (2/(^ + 1)1/0^ 0. Except 
for £ = 1 for which there is a single local (same as global) fitness peak, riopt increases with increasing £ and L (see 
Fig.[T]). For £ — 2 with which we work in this article, there are « 1.15^ local optima on an average. Arguing as above 
for local maximum, it can be seen that the globally fittest sequence is composed of identical blocks with the largest 




FIG. 1: (Color online) Left panel: Block model for £ = 2. The initial sequence and its mutant have correlated fitnesses as 
they have several blocks in common. Right panel: Average number riopt of local maxima in block model as a function of i for 
various L. 



block fitness and has the average fitness given by 2^ J^" c{f/p(/)[// df'p{f')]^'^ ^. Thus the initial sequence cr^"^ can be 
chosen to be any one of the 2^ sequences with same blocks. For convenience, we choose it to be the one with all Os. 

An attractive feature of the block model is that the correlations can be tuned with the block length £. As illustrated 
in Fig. [TJ when two sequences have at least one common block, their respective fitnesses are correlated. For £ = 1, the 
sequence fitnesses are maximally correlated while for £ = L, we obtain the model with maximally uncorrelated fitnesses. 
This statement can be quantified by considering the correlation function Cqj between the fitness wq = w((t(°') of the 
initial sequence and the fitness Wj of a sequence at Hamming distance D — I from it given by 



(L - £)fo + £f, 



j^0,...,£ 



(7) 



where fj is the fitness of the block of length £ with 1 in the jth position. Using the fact that /j's are i.i.d. variables, 
we can write the correlation function as (7| 



Co J = (wqw-i) - {wo){Wj) 



L 



L 



(8) 



where is the variance of the block fitness distribution p{f). The above correlation function is largest at £ = 1 and 
vanishes a.t £ = L. Similarly the correlation function C,; j amongst the one mutant neighbors given by p8| 



a. 



{Wi){Wj 



L-£ 
L 



L 



(9) 



is a monotonically decreasing function of £ for i =/= j. 

In the following, we will study the shell model defined by (j4]) where the fitness w{(t) is chosen from the block model. 
In the next section, we briefly discuss the dynamics of the shell model for the two limiting cases namely £ — I and 
L. Section IIVI which forms the major part of the paper discusses the evolutionary dynamics when the block length 
£ = 2. Finally we conclude with a discussion of our results in Sec. |Vl In the rest of the article, we will assume that 
the sequence length L is an even integer. 



III. SHELL MODEL DYNAMICS WHEN BLOCK LENGTH £ = 1 AND £ = L 



In this section, we briefly discuss the evolutionary dynamics on the fitness landscapes for the two limits of the block 
model namely block length £ = I and L. When block length is equal to one, the sequence fitnesses are maximally 
correlated. Let /o and /i denote the block fitness of the two blocks {0} and {1} respectively. Then the fitness w{D) 
of a sequence at Hamming distance D from the initial sequence is given by 



(10) 
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The fitness landscape thus generated is permutationally invariant since there is a single distinct fitness at each D from 
the initial sequence. It is easy to see that the average number of jumps on fitness landscapes with ^ = 1 is half. This 
is because if /q > /i, a jump cannot occur after D = 0. If /i > /o, as the time taken by the population at Z) > to 
overtake the population at D — given by 

is independent of D, all the populations overtake E{0, t) at the same time and hence one jump occurs with probability 
1/2. Thus the average number of jumps is 1/2 and independent of L. The average number of records from the above 
considerations is given by 1 + (_L/2). 

The opposite limit of maximally uncorrelated fitnesses for which i = L has been studied earlier [l^ [13, [25'| . It has 
been shown that the average number of records is given by (1 — ln2)L for any underlying block fitness distribution 
[it} and the average number of jumps by %/ Ln/2 for exponentially distributed block fitnesses [25j . 



IV. SHELL MODEL DYNAMICS WHEN BLOCK LENGTH 1 = 2 



For the rest of the article, we will consider the case when the sequences are built by blocks of length £ = 2. The 
block fitness is given by /o, /i, /2 and /a corresponding to the blocks {0, 0}, {0, 1}, {1, 0} and {1,1} respectively. 
Let Tii denote the number of blocks with fitness fi,i = 0, ...,3. Then the fitness of a sequence of length L with D 
mutations obtained by averaging over B = L/2 block fitnesses can be written as 

(L -£>- rii - n2)/o + 2ni/i + 2712/2 + "1 - "-2)/3 

Wni,n2[D) = (12) 

In the above expression, since the total number of blocks equals L/2 and the Hamming distance 13 of a sequence from 
the initial sequence is given by ni + 712 + 2n3 = D, we get = {D — ni — n2)/2 and uq = (L — D — ni — ni)l2. As 
no and 713 must be integers, for even D, both n\^ni must be either even or odd whereas for odd Z), either n\ should 
be odd and n-i even or vice versa. Besides, for D < L/2, the conditions ni + n2 < D,ni < D must be satisfied as 
7i3 > and for D > L/2, ni -\- n2 < L — D,ni < L — D are required to ensure the non-negativity of uq. 

As mentioned in Sec. HIl in order to be the globally fittest sequence, a sequence must be composed of blocks of the 
same type. For £ — 2, the global maximum can thus occur a.t D — 0, L/2 and L corresponding to /o, either /i or /2 
and fs being the largest of the block fitnesses respectively. Starting with all the population at the initial sequence, 
we wish to find the properties of the jumps by which the most populated sequence reaches the global maximum. In 
the following subsections, we discuss the statistics of extremes (Sec. IIV records (Sec. lIVB|) and jumps (Sec. IIV C|) 
on correlated fitness landscapes. 



A. Distribution of the largest fitness at constant Hamming distance 

It was shown in ^2^ that the total number of distinct fitnesses at a fixed D increases as . However for questions 
of interest, we need to consider only the sequence with the largest fitness. To identify such sequences, we first consider 
fitnesses with fixed 711+712=71, n > where 711,772 satisfy the conditions described above. As the coefficient of 
/o and /3 in fitness Wn^.n2 depends on 77i +772, assuming /i > /2 and comparing Wni,n~ni and Wn'-^,n-n{ for all 

n'l 7^ 771, we find that Wni,n-ni > Wn[,n-n[ for 77i < 77i. The fitness Wni,n-n-i can be the largest wi^^^n-^iD) of all 
the fitnesses at fixed D and n provided 771 = 7i. We next compare Wnfi and Wn+k,o , k for D > 2. Since 
Wn+kAD) = WnAD) - {k/L)ifo + h- 2/1), it follows that for D < L/2', 

r WD^D) if A > h and /o - 2/i + A < (13a) 
w\Znl (D) = < wiAD) if /i > /2 and fo - 2/i + A > and D is odd (13b) 
I 7^0,0 (-D) if /i > /2 and /o - 2/i + /s > and D is even (13c) 

The above conditions are independent of D (except for the parity) and as we shall see, this property simplifies the 
problem considerably. For D > L/2, the largest possible fitness is obtained on replacing D hy L — D in the above 
discussion. The corresponding conditions for the case when fi < /2 are obtained by interchanging fitnesses fi and /2 
and the indices 771 and 772 in the preceding equation. 
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FIG. 2: (Color online) Distribution Pe{w,D) of maximum fitness for (a) r = 0.1 (solid) and r — 0.4 (broken) with 5=1 and 
(b) (5 = 1 (solid) and 5 = 2 (broken) with r = 0.1. 



We consider the cumulative distribution Ve{w,D) that all the fitnesses at constant D are smaller than w. As 
argued above, for even D < L/2, only one of the three fitnesses wofi, wo,d and wo.o can be the largest. For unbounded 
underlying distribution p{f) with / > 0, we can thus write 

/•OO /"OO /'U /"OO 

Ve{w,D) = / dfopifo) dfMfi) df2p{f2) df3p{f3)e{w-WD,omw~wo,D)e{w~wo,o) (14) 
Jo Jo Ji Jo 

-I 2 



dfo p{fo) 



dfa p{f3 



-{l-2T)fo 



dfi p{h 



(15) 



where 0(...) is the Heaviside step function and r = D / L < 1/2. Specifically, for p{f) = Sf^ ^ , S > 0, we have 

2 



Ve{w,D) = 6 / dff-^e' 
Jo 



a I dfe'^'f 



1 - e 



1 - e 



1 -e 



:y 



(16) 
(17) 



where a = {w/{l — r))^. The probability PeIwjD) = dVE/dw that the largest sequence fitness with D mutations has 
a value w can be easily computed for 5 = 1 and is given by 



Pe{w,D) 



1 - 2r 



1 — 4r 1 — 5r + 6r^ 



4e 2(1-,-),- 
1 - 6r + 8r2 



(18) 



The above distribution shown in Fig. [5^ for two values of r shifts towards right with increasing r as the average 
{w)e = 1 + (55r/36) [29]. Figure shows that the extreme value distribution at fixed r peaks at larger w as 5 
increases. This is contrary to the general expectation that if the tail of the underlying distribution decays fast, the 
probability of finding a large maximum value of a set of S random variables should also decrease when S* 3> 1. Here as 
the number of independent random variables is merely four, the tail of the block fitness distribution is not adequately 
sampled and the block fitnesses lie closer to the average value which increases with increasing 5 thus resulting in the 
behavior seen for Pe{w,D). 

When D is odd, one can write down an expression for the extreme distribution but for large L, it reduces to that 
obtained for even D. The results for extreme statistics when D > L/2 can be obtained on replacing D hy L — D in 
the above discussion [291 . 
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B. Statistics of record fitnesses 

In this subsection, we are interested in finding the probabihty that a fitness Wni,n2{D) is a record i.e. it exceeds 
all the fitnesses in the shell D' < D. As only the largest fitness at constant D can possibly be a record, we need to 
consider only such fitnesses. Unless otherwise mentioned, we assume /i > /2 so that the largest fitness at constant D 
can be one of the following: wq,o{D) if D is even and wi^o{D) otherwise, wd,o{D) for D < L/2 and wl-d,o[D) for 
D > L/2. 

For D < L/2, the fitness wd.o(-D) can be a record if it exceeds all the fitnesses at constant D as well as the ones 
with number of mutations D' < D. The first condition is met if (I13ap is satisfied. As the conditions in (|13ap are 
independent of D (barring parity), the largest fitness in a shell with D' mutations is also WD'.oiD'), 1 < D' < D. 
Then WDfi{D) > wd',o{D') for all > if /i > fo. Thus the probability of wd.,o{D) being a record can be written 
as 

H dfMMfi - /o)e(/i - /2)e(2/i - /„ - /a) (i9) 

i=0 

dfopifo) r dhp{h) dhp{h) dhpih) (20) 

I Jfo Jl Jl 

For D > L/2, the fitness wl~d,o{L>) can be record if wl-d,o{L>) > wl-d' ,o{D') for D' > L/2 and WL-Dfi{D) > 
wd',o{D') for D' < L/2 along with the conditions /i > /2 and /o — 2/i + /s < (see (llSap ). The first two inequalities 
are satisfied if /a > /i and /o < /i • Thus we can write 

n dMnMh ~ /o)e(/i - /2)e(/3 - /i)e(2/i -fo- fs) (21) 

- i=0 

dfopih) r dhpUi) 1^' dhp{f2) ^"dfMh) (22) 

( Jfo Jl Jfi 

For even D, the fitness 'Wq,q{D) can be a record if wo,o{D) > WQfi{D') for even D' and wo,o{D) > wi,q{D') for odd 
D' besides satisfying (|13cl) . If /2 < /i, the fitness wo,o{D) can be a record if /a > /o and /a > 2/i — /o. The last 
two conditions can be split into two cases, namely /a > /o if /o > /i and /a > 2/i — /o if /o < /i. Similarly, for 
/2 > /i, the conditions for wo,o(-D) to be a record are obtained by interchanging /2 and /i. Combining all the above 
conditions, we get 

/u 3 
n dMi)'d{h - /2)e(/3 - /o)e(/o + h- 2/1) (23) 
- 1=0 

= 2[r dfopih) r dhp{h) 1^' dhp{f2) r dhPih) 

J I J fa J I "'2/1-/0 

dhpih) I ' 4fop(/o) / " rf/ip(/i) / ' df2p{h)] (24) 
i Ji Jl 

For odd Z?, the fitness wi,o{D), D > 1 can be a record if (jl3b[) is satisfied, wifi{D) > wifi{D') for odd D' < D and 
wi_o(^) > ^0,0(0') for even Z?' < D. The last two conditions are satisfied if fo < /a and fo < fi respectively. Then 
the probability of wi_o{D), D > 1 being a record is given by 

H df,p{fi)Q{fi - /o)e(/i - /2)e(/3 - /o)e(/o + /g - 2/1) (25) 

i=0 

1 pfi pu 

dfoPifo) / rf/ip(/i) / df2Pif2) / d/3p(/3) (26) 

'I Jfo Jl "'2/1-/0 



The above expression holds for D — 1 also as wi o(l) is a record if wi o(l) > wo o(0) which implies fo < fi besides 

/2</l. 
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FIG. 3: (Color online) Variation of record occurrence probability Pr{D) with Hamming distance D for L = 64. 



1. Record occurrence distribution 

Using the results derived above, we now calculate the probability Pr{D) that a record occurs in the shell with 
-D > mutations given Pr(0) = 1. Figure [3] shows that Pr{D) is not a smooth function - the value of Pr{D) depends 
on whether D is odd or even and whether it is below or above L/2. Thus four distinct cases arise due to this character 
of Pjf {D) which we will discuss below. We shall find that the distribution Pji{D) is universal i.e. does not depend 
on the choice of the underlying distribution of the block fitness. As the global maximum is the last record and the 
only global maximum for D > L/2 occurs with probability 1/4, we may expect the record occurrence probability for 
D > L/2 to he smaller than that for < i/2. 

Even D: When D is even, either wd.o{D) or wq_d{D) can be a record for D < L/2, WL-D,a{D) or wq^l-d{D) for 
D > L/2 or wofi{D) for any even D .Thus the probability of even D ioi D < L/2 having a record is given by 

Pr{D) — 2P{w£)fi is a record) + P(wo,o is a record) (27) 

rfi rfs nfo rfi 

= 2 dfopifo) dhpih) d/2p(/2) + 2/ dhpih) dfopifo) dhpih) dhp{f2) (28) 



Jl J fo Jl Jl Jl Jl Jl 

= - + — = -, D<L/2 (29) 

3 12 4 ' - ' ^' 

Similarly for D > L/2, the record occurrence probability is given by 

Pr{D) = 2P{wl~d,o is a record) + -P(wo,o is a record) (30) 

= 2 ^ dfopifo) ^ d/ip(/i) d/2p(/2) dfspifs) + D> L/2 (31) 

Odd D: For wd,o{D), D > 1 to be a record when D is odd, the same conditions as for even D are required so that 
([20| holds. Thus the probability of a shell with odd D,l < D < L/2 having a record is given by 

Pr{D) = 2 [P{wDfl is a record) + P{wi,o is a record)] (32) 

= 2^ dfopifo) dfipifi) j^^' df2p{f2) D<L/2 (33) 



For D > L/2, the probability that wl^d,o{D) is a record is given by (p2)) and wifi{D) is a record by (|26|) . Thus 
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the probability of a record occurring for odd D > L/2 can be expressed as 

Pr{D) = 2 [P[wl-d.o is a record) + P(wi,o is a record)] (34) 



; Jfo Jl Jf: 



fi 



1 



2 / dfopifo) / dfMh) / rf/2P(/2) / dfspih) ^- , D>L/2 (35) 
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2. Record value dtstnbution 

In this subsection, we calculate the probability Vr{w, D) that the record value in shell D is smaller than or equal to 
w. For this purpose, we will need the probability 'Pr{w{D) < w) that the fitness w{D) in shell D does not exceed w. As 
the record value distribution is not expected to be universal, we will restrict ourselves to distributions with support on 
the interval [0, oo). It can be checked that the cumulative distribution Vr{w, D) gives the probability Pr{w) obtained 
in the last subsection when w — > oo. Below we present the expressions for D < L/2 as the corresponding distributions 
for D > L/2 can be written in an analogous manner. 

Even D: As seen for the distribution of extreme values in Sec. IIV Al the distribution for the record value is a function 
of the ratio r = D/L for even D. Since either woxiiD) or wofliD) can be a record for even D < L/2, the cumulative 
probability Vr{w, D) = 2V{wd.o <w)+ 'P{wo^o < w) where 

/•oo 3 

r{wDfi <w) = TT dfM) Q{w - WDfiMfi - /2)e(2/i - /o - /3)e(/i - /o) 

T^+fo ffl /•2/1-/0 



dfoPifo) 1^ dfMh) I dhp{f2) / dhpih) (36) 



and 



POO ^ 

viwo,o <w) = 2 H dfM) eiw - wo.oMfs - /o)e(/i - /2)e(/o + /s - 2/0 

•^0 ^=0 
j-w i-fo rfi 

= 2 / dfopifo) / dhpih) / df2pif2) / d/3p(/3) 



Jo Jo Jo Jfo 

2 / dfopifo) / d/ip(/i) / df2p{f2) / dfspifs) (37) 

Jo Jfo JO "'2/1-/0 



Using these expressions, it is straightforward to see that 

fw pfa ffl f +/o 

Vr{w,D) = 2 / dfopifo) / 4fip(/i) / df2p{f2) / dfspif3 

Jo Jo Jo Jfo 

^+fo rfi r^^+fo 



+ 2 / dfopifo) / rf/ip(/l) / df2pif2) / #3P(/3) (38) 

Jo Jfo Jo Jo 

Taking the derivative of the last expression with respect to w, we obtain the distribution Pr{w,D) that the record 
value equals w. For p{f) = e~-f , the distribution Pr{w, D) is given by 

, e-^"^ + 2e-^ - e-^ 26'^ e'^'^iS-Sr) e-fr-e~^'"(3-8r) , , 

Pr(w, D) = 1 i ^ H '- 39) 

' ' i-2r l-4r l-6r + 8r2 1 - r(5 - 6r) ^ ' 

The above result for the record value distribution is compared with the extreme value distribution Pe(w,D) given 
by (|18p in Fig. 2] for two values of r. Though the record fitness is also the extreme fitness in shell D, the converse 
is not true and the distribution Pr{w,D) < Pe{w,D) for all w at a given D. We also note that the most probable 
record value in shell D is smaller than the corresponding extreme value - this behavior is unlike that for uncorrelated 
fitnesses for which record is a maximum of a larger set of independent variables. 
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FIG. 4: (Color online) The probability distribution of the extreme value (solid lines) given by (|18|l and record value (dashed 
lines) by (|39[) for r = 0.1 (left curves) and r — 0.4 (right curves) for p{f) — . 

Odd D: To find the record value distribution for odd D, besides V{wDfl < w), we require the cumulative probability 
7'(wi,o < w) that the fitness wi,o(-D) in shell D does not exceed w. The latter can written as 



/•oo 3 

/ n "^M^^ - ^i o)0(/i - /2)e(/o + /3 - 2/i)e(/i - /o)e(/3 - /o) 

•^0 i=0 

2D ~^J^ fJ^ i D-\ 



Viwifi < w) 

Jo 

dfoPifo) I dfMfi) I dhp{f2) I dhpih) (40) 

J fa Jo "'2/1-/0 

which reduces to the second integral in (|37|) for L ^ 1. Thus for large L, the cumulative distribution 'Pr{D,w) for 
odd D is also a function of r. However unlike extreme value distribution for odd D, the distributions for even and odd 
D do not match for L 3> 1 as the expression for the distributions for the distributions P (if 1,0 < w) and P(wo,o ^ w) 
do not coincide. 



3. Distribution of the number of records 



To find the probability NrIti) that the total number of records equals n, we first calculate the record configuration 
probability Q({u'ni,ra2 (^)}) defined as the probability that all the elements in the set {wnj^nj (Z?)} are records. This 
distribution depends on the location of the global maximum. If /o is the largest block fitness, the global maximum 
occurs at _D = and obviously there are no records beyond £> = in this case. 

When /o is not a global maximum and /i > /2, only four record configurations occur with a nonzero probability. 
When the fittest block has a fitness /i, a record cannot occur beyond D — L/2 and only the conditions in (pn|) are 
satisfied since 2/i — /o — /s must be positive. Thus the fitness W£ifl{D) for all D < L/2 is a record with probability 

g(«;i,o(l),-,«^LAo(V2)) = \ (41) 

When the block fitness /a is the largest, the records occur until D = L at a. spacing of one or two depending on the 
sign of /i — fo as explained below: 

(i) From the discussion at the beginning of Sec. lIVBi it is evident that when fi < /q, the only set of fitnesses that 
can be a record are wofi{D) for all D. Using the conditions in (|24|) . it follows that when /2 < /i < /o < /s j a record 
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occurs only in even D shells. As fi's are independent and identically distributed (i.i.d.) random variables, all 4! block 
fitness configurations are equally likely and therefore we get 

0(w;o,o(2), wo.oii), wo,o{L)) = ^ (42) 

(ii) If /i > fo (and /2), the fitness wifl{l) is a record. The next record depends on the sign of 2/i — /o — /s- From 
p4)) and ((26)) . it follows that if 2/i — /o — /s < 0, the fitness ii;o,o(£') is a record for all even D and wi,o{D) for all 
odd D with probability 

Q{wi,oil),wo,om,...,wML~l),wo,o{L))^ dfopifo) d/ip(/i) / 4f2P(/2) / rf/3P(/3) (43) 

7i 7/o J I "'2/1-/0 

If 2/1 - /o - /s > 0, due to dini) and ([22]), the fitnesses wdAD) for ah D < L/2 and wl-dAD) for all D > L/2 are 
records. This event occurs with probability 

fu rfi /■2/1-/0 

g(«;i,o(l),...,?i;L/2,o(i/2),w^L/2-i,o(i/2 + l),...,wo,o(i)) = / rf/oP(/o) / dfMfi) df2p{f2) dhp{h) 

Jl Jfo Jl Jfi 

(44) 

From the above discussion, it is evident that the total number of records (ignoring the one at D = 0) can be 
either L/2 (due to (gl]) and (|42])) or L (see (|43|) and (04])). The probability N^n) of total number n of records is 
independent of underlying block fitness distribution and is given by 

Nr{LI2) = 2^^ + — , NuiL) = — ^- (45) 
HK I ) 1^4 ^ 24/ 12 ' 12 6 ' ' 

where we have used that twice the sum of and pi)) equals (|35p . The average number 7?, of records can be found 
using Njiin) or Pr{D) and is given by 

n=l D=l 

for any even L. 



C. Reaching the global maximum 

As discussed in Sec. [ill all records are contenders for being a leader; however only those records for which the 
overtaking time is minimised qualifies to be a jump [l6l - [l8l |. Like records, the statistics of jumps depends on the 
location of the global maximum. If fo is the fittest block, the unmutated sequence with fitness wo,o(0) = fo is the 
leader throughout. 

If /i(> /2) is the global maximum, the last record and hence the last jump occurs at _D = L/2. Since the time of 
intersection T(0, D) of the population E{D, t), D < L/2 with the population £'(0, t) given by 

Ti = T{0,D) = , D<L/2 (47) 

WD,o(D) - wofiiO) 2(/i-/o) 

is independent of D, all the populations overtake the population of the initial sequence at the same point. Thus all 
the record populations participate in the evolutionary race. But as the population E{L/2,t) has the largest fitness, 
it becomes the final leader thus leading to a single jump when /i (or /2) is the largest fitness. 

If the global maximum is /s which occurs dX D = L, the following cases as discussed in Sec IIV B"3l arise: 
(i) If /i < /o, the population with the record fitness wo,o{D),D < L overtakes that with the initial fitness wo,o(0) 
at a time given by 

T3,i = r(o,i?) = — — ^ ^^_^n<L (48) 

wo,o{D) - wo,oW /3-/0 



so that all the populations with record fitness wo.oiD) intersect at the same time and the population of the global 
maximum at D = L takes over in a single jump. 
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(ii) If /i > fo and 2/i — /o — /a < 0, the population with fitness wo^q{D) for aU even D and it;i^o(-C') for all odd D 
intersects E{0,t) at the following intersection time: 

^'»- ^> - .,„(D)'^.„,„(0) - (D-l)/3 + 2A-(D+l)/o • "-^^ («' 

By virtue of the condition 2/i — /o — /a < 0, the intersection time for odd D is greater than that for even D. Therefore 
the current leader at I? = is overtaken hy D = L resulting in a single jump at time T3^2 = L/{f^ — fo). 

If 2/i — /o — /a > 0, the record fitnesses are wd,o{D) for D < L/2 and wl-d,o{D) for D > L/2. The populations 
corresponding to these fitnesses overtake the leader at £> = at time 

T{0,D) = , D<L/2 (51) 

^iJi - Jo) 

As the intersection time for D < L/2 is minimum amongst the rest and Wi^/2fi{L/2) is the largest fitness, the first 
jump occurs when the population of the sequence with fitness WL/2fl{L/2) overtakes E{{),t). The next change in 
leader occurs at the point of intersection of populations involving the fitness wl^£i q{D), D > L/2 with the current 
leader at a time 

T3,3 = T(L/2, D) = Z' , D>L/2 (53) 

^1/3 - Jl) 

which is again D independent. Thus the population E{L,t) is the leader after E(L/2,t) and the global maximum is 
reached in two jumps. 



L Distribution of the number of jumps 

It is obvious that when any block fitness other than /o is the globally largest fitness, there will be at least one jump 
(corresponding to globally fittest being the final leader) so that the probability of at least one jump equals 3/4. In 
addition, there can be one more jump when /a is the global maximum and 2/i — /o — /a > (see ((53| V Due to (|44l) . 
the probability p2 of the second jump is given by 

P2 = 2 r dfopifo) r dfMfi) r df2P{f2) ^" dfMfsM^ - 2/i + fo) (54) 



Jl Jfo Jl Jfi 

Thus the average number of jumps is given by (3/4) +p2. As p2 is independent of L, the average number of jumps 
is of order unity for any underlying distribution but the constant p2 is not universal. For instance, when the block 
fitnesses are chosen from an exponential probability distribution, p2 = 5/72 « 0.069 while for uniform distribution, it 
equals 5/48 « 0.104. 

2. Temporal jump distribution 

We are interested in the probability P{t) that the last jump occurs at time t > shown in Fig. [5] for p{f) — e^^ . 
This distribution is a sum of the probability PA{i) that the last jump occurs at t when /i or /2 is a global maximum 
and Psit) when /a is a global maximum. We first consider the cumulative probability VA{t) = Jq dt'PA{t') which on 
using that /i (or /2) is a global maximum and (|47|) gives 

VAit) = 2 rfldfMMt - Ti)e{fi - /o)e(/i - /2)e(/i - /a) (55) 

= 2 / rf/ip(/i) " dfopifo) df2p{f2) df,p{f,) (56) 
Jl+ij Jl Jl Jl 
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FIG. 5: (Color online) Log-log plot of the distribution P{t) of the last jump for p{f) — e ' and L — 100. The broken line has 
a slope —2. 



Differentiating VAit) with respect to time t yields 



PA{t) = = i /" dfpU)vU f dgpig) ] (57) 



where we have defined e — L/ {2t). For large times t ^ L/2, the integral on the right hand side of the above equation 
reduces to the probability G(0) that the gap between the globally largest and the second largest in a set of i.i.d. 
random variables is zero [131. Thus the probability PAit) decays as ~ LG{Q)/t^ at large times. 

When /a is the largest fitness (and /i > /2), the last jump can occur at times given by p8)) . (|50)) and (|53|) . As 
Ta^i = 23^2, the corresponding conditions (discussed in Sec lIVlTSl on the block fitnesses can be combined to give the 
following cumulative probability 

Vi{t) = / dfspih) / dfopifo) / dhpih) / d/2p(/2) (58) 

Jl+2e Jl Jl Jl 

and the probability distribution 

Piit) = ^ r dhp{h)p{h - 2e) ' dhpih) dhpU2) (59) 

' Jl + 2<L Jl Jl 

which also decays as l/t^ at large times. An expression for the distribution for the last jump time Ts^a can also be 
written down in an analogous manner and reads as 

P2 (t) = ^ j^^ dhp{h)p{h + e) ' dfopifo) dhpif2) '4" ^G(O) (60) 

Clearly the distribution Psit) = 2{Pi{t) + P2{t)) - t-'^. Thus the probability distribution P{t) = PA{t) + PB{t) obeys 
the inverse square law for any block fitness distribution. 



CONCLUSIONS 



In this article, we studied a deterministic model [Tg| describing the evolution of a population of self-replicating 
sequences on a class of strongly correlated fitness landscapes with several fitness peaks Q. The broad questions 
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addressed in this paper have been studied on completely uncorrelated fitness landscapes in previous works |16l - ll8l |. 
Here we are interested in finding how the various evolutionary properties are affected when the sequence fitnesses are 
correlated. 

We are primarily interested in the evolutionary dynamics and in particular, the properties of jumps that occur in 
the population fitness when the most populated sequence changes. As discussed in Sec. [TTl the largest fitness at a 
constant Hamming distance from the initial sequence only need to be considered for this purpose. This led us to 
consider the problem of the extreme statistics of correlated random variables [13, which has been much less studied 
than its uncorrelated counterpart. We found that the extreme value distribution is not of the Gumbel form which is 
obtained when the random variables are i.i.d. and their distribution decays faster than a power law. In fact, we expect 
that the universal scaling distributions which depend only on the nature of the tail of the underlying distribution do 
not exist for such correlated random variables as the number of independent variables namely the block fitnesses is 
too small. 

As the minimum requirement of a sequence to qualify as a leader is that it must be a record, we also studied several 
record properties of correlated variables. Recently the statistics of record events when the number of observations 
added at each time step increases either deterministically [13] or stochastically [sij have been studied. The records 
defined in the shell model are an example of the former category as the number of observations changes as (^) with 
D. It was shown that the probability for a record to occur in a shell with D mutations is not a continuous function 
unlike the record distributions for independent random variables [T?!; however the universality property that the 
distribution is independent of block fitness distribution continues to hold. The average number of records was found 
to increase linearly with L as in the maximally uncorrelated case but with the prefactor given by (1 — In 2) 0.306 
for the latter case which is smaller than in (|46)) . 

In the uncorrelated fitness model, the L dependence of the average number of jumps was seen to depend on the class 
of the fitness distribution p(/). For p{f) decaying faster than a power law, the average number of jumps increased 
as -s/L [igI [17| . In contrast, here the average number of jumps was shown to be independent of L for any choice of 
block fitness distribution p{f) although the value of the constant was found to be nonuniversal. These results suggest 
that for block fitness distributions decaying faster than a power law, the average number of records increases but the 
average number of jumps decreases with increasing correlations. It is also interesting to see how the average number 
of jumps change when the block fitness depends not only on the block configuration but also on its position in the 
sequence [?!]. The result of our numerical simulations for this general model shows that the average number of jumps 
increases linearly with the number of blocks. However the prefactor is given by the average number of jumps obtained 
in the locus-independent block fitness model namely (3/4) + p2 (see Fig. [5]) . This suggests that the different blocks 
behave independently in the locus-dependent block fitness model; a detailed understanding of this model is beyond 
the scope of this article and will be presented elsewhere. 

The temporal distribution for the last jump to occur at time t obeys t^"^ law for infinite (and finite) populations 
evolving on uncorrelated fitness landscapes [l6l-[l8| . Here we have shown that on a class of strongly correlated fitness 
landscapes, the same law is obeyed. The origin of this power law can be understood using a simple scaling argument 
when the fitness variables are independent variables [l6[ but it is not obvious at the outset that such an argument 
can be used here since the sequence fitnesses are correlated. But it turns out that the jump time involves the i.i.d. 
block fitnesses and therefore t~'^ law is obtained here as well. 

We close this article by a discussion of the deterministically evolving populations of infinite size studied here vis- 
a-vis finite populations that are subject to stochastic fiuctuations on multi-peaked fitness landscapes. As discussed 
in Sec. |TT1 the basic difference between a finite and an infinite population is that while the former has a finite 
mutational spread in the sequence space, all the mutants are available at all times in the deterministic case. As a 
consequence, on rugged fitness landscapes, a finite population can get trapped at a local optimum from which it can 
escape by tunneling through a fitness valley [T^]. In fact at late times, most of the population passes exclusively 
through the local fitness peaks and thus such sequences are the most populated ones when the population size is 
finite. In contrast, as the entire sequence space is occupied for infinite population, a transition to a higher fitness 
peak takes place by overtaking the less fitter populations as explained in Sec. |TT1 Thus the underlying mechanism 
for the punctuated increase of fitness on fitness landscapes with multiple peaks is different in the two situations 
[isj . Moreover the most populated sequence involved in the jump event is not necessarily a local maximum (for any 
correlation) for infinite populations. To see this, consider the fittest sequence with fitness w'^""'^^ (D) at Hamming 
distance D from the initial sequence a^^\ Barring the initial sequence, all the one-mutant neighbors of sequence with 
fitness iy(""i^)(l) are at Hamming distance two from the initial sequence. Consider the scenario when the sequence 
with fitness ui*^™"^^ (2) is a nearest neighbor of sequence with fitness w(™'^=^)(l). Then the fittest sequence at distance 
unity from the initial sequence can be a jump if at least i«(™'^^)(l) > w{a^^^) and the minimum intersection time 
condition (it;'™'^^) (1) — u)(ct'^''^))^^ < 2(u>(™°^^(2) — w{a'^'^^))~^ is obeyed. Clearly the latter condition rewritten as 
yj{max)f^2) — ^i;(™i^)(i) < y;(™a2;)^2) — w{(j'^^^) Can be satisfied even when i(;(™"^)(l) is not a local maximum. Thus the 
number of jump events are not related to the number of local optima for an infinite population. 
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FIG. 6: (Color online) Average number of jumps as a function of B for the block model with locus-dependent block fitness 
chosen from exponential distribution. The line has a slope given by 3/4 +P2 = 0.819. 
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